Socioeconomic status and stroke severity: Understanding indirect effects via risk factors and stroke prevention using innovative statistical methods for mediation analysis

Background Those with low socioeconomic status have an increased risk of stroke, more severe strokes, reduced access to treatment, and more adverse outcomes after stroke. The question is why these differences are present. In this study we investigate to which extent the association between low socioeconomic status and stroke severity can be explained by differences in risk factors and stroke prevention drugs. Methods The study included 86 316 patients registered with an ischemic stroke in the Swedish Stroke Register (Riksstroke) 2012–2016. Data on socioeconomic status was retrieved from the Longitudinal integrated database for health insurance and labour market studies (LISA) by individual linkage. We used education level as proxy for socioeconomic status, with primary school education classified as low education. Stroke severity was measured using the Reaction Level Scale, with values above 1 classified as severe strokes. To investigate the pathways via risk factors and stroke prevention drugs we performed a mediation analysis estimating indirect and direct effects. Results Low education was associated with an excess risk of a severe stroke compared to mid/high education (absolute risk difference 1.4%, 95% CI: 1.0%-1.8%), adjusting for confounders. Of this association 28.5% was an indirect effect via risk factors (absolute risk difference 0.4%, 95% CI: 0.3%-0.5%), while the indirect effect via stroke prevention drugs was negligible. Conclusion Almost one third of the association between low education and severe stroke was explained by risk factors, and clinical effort should be taken to reduce these risk factors to decrease stroke severity among those with low socioeconomic status.


Background
Stroke is the third leading cause of death and disability combined, and the second leading cause of death worldwide [1]. Strokes are mainly classified as either ischemic, caused by a clot in a cerebral blood vessel, or hemorrhagic, caused by a bleed in the brain. Almost half of all stroke-related mortality may be due to modifiable risk factors, where only 10 risk factors account for around 90% of the total modifiable risk for stroke [2,3]. Differences in age-standardized rates of stroke disability and mortality that are attributable to modifiable risk factors are seen between countries of different income levels, where those with high income are at less risk than those with lower income [4].
Even within countries, socioeconomic status (SES) is an important factor when considering stroke risk, care and outcomes. SES is defined as an individual's economic and social position relative to others, typically measured by education, occupation and income [5]. Differences in stroke risk, severity and outcomes depending on SES are well established, where those with lower SES are at greater risk of a stroke and suffer more adverse outcomes [3,[6][7][8][9][10]. Even in Sweden, a country with relatively limited income inequalities, and publicly financed health care, SES differences in stroke care are present where those with lower SES have poorer access to stroke unit care, acute treatment and secondary prevention [11][12][13].
The underlying reasons for these differences remain largely unclear, and more important yet, how can they be prevented. Attempts to explain SES differences in stroke have often included adjusting for possible intermediate factors and investigating to which extent this attenuates the effects of SES. One example is a large meta-analysis which found that modifiable vascular risk factors accounted for around 50% of the increased risk of stroke for those with low SES [14]. It has been argued that this risk may be higher since patients with lower SES are more likely to be exposed to more risk factors, and risk factors in combination may have a multiplicative effect [15]. An alternative to adjustment for possible intermediate factors is performing a mediation analysis where the effect of SES is separated into indirect and direct effects [16]. Indirect effect (s) then capture the effect of SES on the outcome that operates through some intermediate variable(s) (mediators) of interest (e.g. risk factors) while the direct effect captures the effect of SES on the outcome that does not operate through the intermediate variable(s). This separation can then give an idea of e.g. how much of the SES difference that would remain if we could intervene on the mediators.
Previous studies suggest that patients with low SES have more severe strokes than patients with high SES [6,8,17], and that a substantial part of SES differences in survival up to 3 months after stroke could be explained by differences in stroke severity [18]. In this nationwide, register-based study we investigate the extent to which differences in modifiable risk factors and access to stroke prevention drugs explain the relationship between low SES and stroke severity.

Methods
Requests to access the data set from qualified researchers trained in human subject confidentiality protocols may be sent to the Swedish Stroke Register (Riksstroke) at riksstroke@regionvasterbotten.se

Setting
In Sweden, primary and secondary healthcare is provided to the population through 21 different regions. Healthcare is mostly tax funded, apart from a small co-payment made by the patient. Acute stroke care is provided at 72 hospitals across the country [19].

Study design
This retrospective register-based cohort study included all adult patients registered in Riksstroke, a quality register for hospital stroke care, with an acute ischemic stroke (International Classification of Diseases, Tenth Revision: I63) in Sweden between 2012 and 2016. Riksstroke retrieved the study population. All 72 hospitals that provide acute stroke care register patients in Riksstroke and the register has been shown to cover up to 96% of all patients treated in hospital for acute stroke, meaning that the risk of selection bias is minimal [19]. Annually there are approximately 20 000 strokes registered, with a decreasing trend over time [19]. The quality register holds information on temporal data (stroke onset time, admission time etc.), patient characteristics (sex, age, cardiovascular risk factors, stroke severity, primary prevention etc.), acute care, secondary prevention and outcomes (mortality, functional outcomes etc.). At the acute stage information is collected by hospital staff. Patients and next of kin are informed about the registration and aim of the register and their right to decline participation (opt-out consent). Data are automatically checked upon entry and data validity is evaluated continuously. Riksstroke data were linked with data from the Longitudinal integrated database for health insurance and labour market studies (LISA) managed by Statistics Sweden, from which data on socioeconomic status was obtained. The linkage was performed through the personal identity numbers (Swedish national identification numbers) of the patients and was done by Statistics Sweden. Ethical approval was obtained from the regional ethics review board in Umeå, Sweden (reference number 2017/184-31). Results were reported according to the Reporting of Studies Conducted Using Observational Routinely Collected Health Data (RECORD) Statement (S1 Table).

Statistical methods
Variables. For our main exposure, low SES, we used low education as proxy. The rationale behind using education level rather than e.g. occupation or income as the measure of SES is that it tends to be more stable across the life course and that education is related to both material and non-material resources [20]. We defined the exposure as having low education (only primary school) vs. having mid/high education (secondary school/university). For our outcome, stroke severity, we used level of consciousness upon arrival to the hospital, measured using the Reaction Level Scale as proxy. An RLS point of 1 (alert) versus RLS points of 2-8 (lowered consciousness) were used. We used the directed acyclic graph (DAG) methodology to create an illustration of the relationship between education level and stroke severity, including possible confounders and mediators [21]. We sorted variables into three groups; "baseline confounders" (sex, age, year of stroke), "risk factors" (smoking, diabetes, atrial fibrillation, previous stroke, activities in daily living (ADL) dependency (defined as inability to move around indoors, manage dressing, and/or using the bathroom without assistance) at time of stroke) and "stroke prevention drugs" (antihypertensives, statins, antiplatelets, anticoagulants), with the latter two hypothesized to lie on the pathway between education and stroke severity. The variables screened for inclusion were based on clinical experience, but ultimately decided upon depending on availability in the register. See Fig 1 for the DAG and all included variables.
Mediation analysis. We used a causal inference approach to mediation which, compared to the traditional approach [22], has the advantages that direct and indirect effects can be defined more generally, that the assumptions required to estimate effects from data are explicit, and that effects can be estimated using a range of different methods [23].
Different versions of direct and indirect effects have been suggested depending on the aim of the causal mediation analysis [23]. Here we estimate so-called interventional disparity effects [24] with a focus on how the disparity in stroke severity between patients with low education and patients with mid/high education might change if we were to intervene on the mediators in the "risk factor" and "primary prevention" pathways. We separated the total association between education and stroke severity into direct and indirect components. The direct component is represented by the interventional disparity direct effect which corresponds to the extent to which the association between education and stroke severity would remain if the distributions of risk factors and stroke prevention drugs were made to be the same in patients with low education as patients with mid/high education [24]. The interventional disparity indirect effect corresponds to the extent by which the stroke severity of patients with low level of education would change had the distributions of their mediators been changed to those of patients with mid/high level of education. We estimate the interventional disparity indirect effects through all mediators taken jointly, as well as through the mediators in the "risk factor" and "stroke prevention drugs" pathways separately and through the dependence between the mediators in the two pathways.
To gain an idea of the relations in Fig 1 we estimated the associations between the exposure low education and the outcome stroke severity and each of the mediators, as well as the associations between each of the mediators and stroke severity using separate logistic regression models with and without adjustment for other covariates. Descriptive statistics are presented for age categories, but the analysis models include age and age squared as continuous covariates.
The total association and direct and indirect effects were estimated through Monte Carlo simulation based on logistic regression models for the outcome, given exposure, mediators and confounders, and mediators, given exposure and confounders [24][25][26]. The general estimation procedure is described in detail elsewhere [24]. All 2-way interactions and age squared were included in the models to make them flexible, reducing the risk of model misspecification bias [26]. The robustness of the results to alternative model specifications was checked with analyses based on models with only main effects. Standard errors of the effects were estimated through bootstrap.
The estimation procedure combines Monte Carlo simulation and bootstrap and is thus very computer intensive. For this reason, we opted to perform complete case analyses, meaning that patients with missing information on any variable were excluded, rather than adding multiple imputation steps to the procedure. A sensitivity analysis was performed, where the effects were re-estimated based on an imputed data set (single stochastic imputation using chained equations with 10 burn in iterations) [27].
All analyses were performed in R [28]. The code is available from the corresponding author upon request.

Baseline data
We identified 101 261 patients with ischemic stroke in 2012-2016, of which 86 316 had complete records (i.e. no missing values) on all study variables and were included in this study. The total proportion of patients with any missing value was 14.8% with the largest proportions of missing values observed for smoking (9.5%) and ADL-dependency at baseline (3%), with less than 2% missing values for all other variables (S2 Table).
Low education was associated with an increased the risk of having lowered consciousness upon arrival also after adjustment for confounders (OR: 1.16, 95% CI 1.11-1.21, Table 2). After adjustment low education also increased the risk of all risk factors except for atrial fibrillation (0.98, 0.95-1.02). The biggest increase in risk was seen for smoking, diabetes and ADLdependency. For stroke prevention drugs, low education increased the chance of receiving prophylactic treatment except for anticoagulants where low education was associated with a decreased chance to receive treatment (0.92, 0.88-0.96).
All risk factors except for smoking were associated with an increased risk of lowered consciousness ( Table 2). The greatest increase was associated with ADL-dependency at baseline (2.88, 2.74-3.03). For stroke prevention drugs the results were more disparate, with small associations between lowered consciousness and antihypertensive treatment and antiplatelets, decreased risk of lowered consciousness for patients with statins (0.89, 0.85-0.93) and increased risk for patients with anticoagulants (1.09, 1.02-1.16).

Indirect and direct effects of education on stroke severity
Low education was associated with an excess risk of lowered consciousness of 1.4% (95% CI: 1.0%-1.8%) compared to mid/high education, adjusting for confounders ( Table 3). The direct effect was 1.0% (0.6%-1.4%), meaning that 71.3% of this association would remain if risk factors and stroke prevention drugs among patients with low education had the same distributions as among patients with mid/high education with the same sex, age and year of stroke. The indirect effect through all mediators taken jointly was 0.4% (0.3%-0.5%), which corresponds to the reduction in risk of severe stroke in patients with low education, if their risk factors and stroke prevention drugs had the same distributions as those of patients with mid/high level of education with the same sex, age and year of stroke.
Nearly all of the indirect effect was through the risk factor pathway, while the indirect effects through stroke prevention drugs and the dependence between the mediators in the two pathways were small (Table 3).
When using alternative regression models only including main effects the results were similar, with a slightly larger adjusted total association and direct effect (S3 Table). Repeating the main analyses on singly imputed data also gave similar results (S4 Table).

Discussion
We found that low SES, measured by low education, was associated with more severe strokes, where patients with low education were found to have 140 additional severe strokes per 10 000 patients over the study period compared to patients with mid/high education, adjusting for confounders. Almost 30% of this association was found to be attributable to an indirect effect via risk factors. Looking at the pathway we saw that diabetes, previous stroke and ADL-dependency were associated with both low education and more severe strokes. Differences in stroke prevention drugs could not explain the association between low SES and stroke severity.
The pathways between SES and outcomes after stroke are complex and here we have focused on the pathways that connect SES and stroke severity. To our knowledge this is the first study to investigate the extent to which risk factors and stroke prevention drugs contribute to the SES-stroke severity pathway. Our results indicate that risk factors should be a target to decrease stroke severity among those with low SES. From a clinical point of view, we think that targeting diabetes would be beneficial in reducing the inequality in stroke severity among different levels of SES. Type 2 diabetes accounts for 85-90% of all diabetes in Sweden [29]. Lifestyle factors have a large impact on the risk of type 2 diabetes, a study found that 9 out of 10 new cases of diabetes in older adults could be attributable to the lifestyle factors obesity, physical activity, diet, smoking and alcohol intake [30]. It is unclear whether physical activity differs between levels of SES, The exposure-outcome and exposure-mediator models adjust for the confounders age, sex and year of stroke. The mediator-outcome models adjust for the confounders and the exposure low education. b Defined by lowered consciousness upon hospital arrival.
https://doi.org/10.1371/journal.pone.0270533.t002 Table 3. Adjusted total association, direct and indirect effects estimated as absolute risk differences (excess risks). while there is evidence that obesity, smoking, worse diet and more alcohol intake are associated with lower SES [31][32][33][34]. Hence, primary prevention of diabetes targeting these risk factors is desirable.

Effect
Regarding the influence of ADL-dependency on stroke severity, previous studies have shown that those with low SES are at greater risk of being ADL-dependent [35]. Behavioral risk factors and co-morbidity seem to play a major role in these inequalities [36]. An aim for future studies using mediation analysis would be to target why ADL-dependency is more prevalent among those with low SES.
A major strength of our study is that it covers an unselected population since all hospitals that care for patients with acute stroke in Sweden are included and that Riksstroke has been shown to have excellent coverage [19]. Data is also prospectively collected and regularly validated, which together with low levels of missing data vouch for high data quality [37].
Due to the computationally intensive estimation method, the use of multiple imputation was deemed infeasible, and a complete case analysis was therefore performed. The proportion of missing data in the study was relatively low, and sensitivity analysis using single imputation showed similar results. Missing data is unlikely to have impacted the results.
A drawback of observational studies in general and in our study is that residual confounding cannot be completely ruled out. For the methods used in this study we assume that there are no unobserved confounders of the mediator-outcome relationships, i.e., no unobserved confounders of the risk factor-stroke severity or stroke prevention drugs-stroke severity pathways. In our analyses we have adjusted for the baseline confounders age and sex as well as year of stroke to minimize any confounding temporal effects. By the retrospective design we are restricted by the predefined set of variables that are collected in the register, this affects the possibilities to adjust for confounding and also the potential mediators that can be included in the study. For example, there is no information regarding stroke volume and location, compliance to medication or laboratory tests, and information on medication with antihypertensives is provided as a proxy for information regarding if the patient has hypertension or not. Except for smoking, information on lifestyle factors is not registered in Riksstroke and was therefore not included in the risk factor pathway. It has been shown that a healthy diet, a physically active lifestyle and lower alcohol consumption are associated with less severe strokes [2,38,39]. It is therefore likely that the mediating role of risk factors is underestimated in our study. Future studies combining Riksstroke data with health surveys that collect a wider variety of lifestyle variables could shed further light on the role of risk factors in combination with SES.
We have used level of consciousness as a proxy for stroke severity. A variable with higher resolution would be NIHSS, but levels of missing data are >50% for this variable compared to 1.6% for level of consciousness. Rather than performing multiple imputation of NIHSS, which requires the additional unverifiable assumption that data are missing at random (MAR) [27], we opt for using the more complete variable. Level of consciousness has been shown to be a good proxy for NIHSS in predicting death after stroke [40].
We used education as proxy for SES. Education is associated with social status and may be related to both access to material and non-material resources such as knowledge that affects health behaviors, while income relates more directly to economic status and material resources [20]. An advantage to using education is that it is established early in life and is less variable across the lifespan compared to income, which is particularly important in an elderly population group. Furthermore, in the current study we used an individual measure of SES, but an area-based measure could add further insights into the importance of the context that a person inhabits [41,42].
Different versions of direct and indirect effects have been suggested [23]. We have estimated so-called interventional disparity direct and indirect effects via risk factors and/or stroke prevention drugs [24]. These effect types have the advantage that path-specific effects can be estimated without having to assume that there are no unobserved common causes of the mediator variables, which is unlikely to be fulfilled in practice, and are also valid even when the true underlying direction of associations between the mediators are unknown [26]. By using a disparity focused approach we shift the focus from the effect of intervening on SES directly, which is difficult in practice, to more policy relevant questions regarding what would happen to the SES disparity if interventions on intermediate variables were implemented.
The estimation of the effects is based on regression models and the results therefore vary depending on how these are specified. To reduce the risk of model misspecification bias we specified flexible models including interactions and higher order terms. To check the robustness of our results these were compared to analyses based on simple models only including main effects and no substantial differences were found.
A large proportion of the association between low education and severe stroke was a direct effect, not operating through either the risk factor or stroke prevention drugs pathways. To shift the ratio from direct to indirect effects and hence offer more explanation of the relationships, the data resolution should be higher and include a larger set of possible factors of interest.
Finally, our study is set in Sweden where both education and health care are publicly financed. It is important to note that the generalizability of our findings may be restricted to countries with similar population demographics and welfare systems.

Conclusion
We found that almost 30% of the increased risk of a more severe stroke among low SES patients stems from differences in risk factors, while the effect of stroke prevention drugs was negligible. Hence, risk factors should be addressed more aggressively by clinicians to decrease stroke severity in those with low education. We also hope to inspire a more widespread use of mediation analysis by showing the potential in elucidating complex relationships between SES and health outcomes.
Supporting information S1 Table. The RECORD statement-Checklist of items, extended from the STROBE statement, that should be reported in observational studies using routinely collected health data.  Table. Adjusted total association and direct, and indirect effects estimated as absolute risk differences (excess risks) based on singly imputed data (stochastic imputation using chained equations with 10 burn in iterations). Estimates based on 500 Monte Carlo simulations. (DOCX)